An I-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents

نویسندگان

Mohamed Morchid

Mohamed Bouallegue

Richard Dufour

Georges Linarès

Driss Matrouf

Renato De Mori

چکیده

Various studies highlighted that topicbased approaches give a powerful spoken content representation of documents. Nonetheless, these documents may contain more than one main theme, and their automatic transcription inevitably contains errors. In this study, we propose an original and promising framework based on a compact representation of a textual document, to solve issues related to topic space granularity. Firstly, various topic spaces are estimated with different numbers of classes from a Latent Dirichlet Allocation. Then, this multiple topic space representation is compacted into an elementary segment, called c-vector, originally developed in the context of speaker recognition. Experiments are conducted on the DECODA corpus of conversations. Results show the effectiveness of the proposed multi-view compact representation paradigm. Our identification system reaches an accuracy of 85%, with a significant gain of 9 points compared to the baseline (best single topic space configuration).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Automatic Text Summarization Approaches to Speed up Topic Model Learning Process

The number of documents available into Internet moves each day up. For this reason, processing this amount of information effectively and expressibly becomes a major concern for companies and scientists. Methods that represent a textual document by a topic representation are widely used in Information Retrieval (IR) to process big data such as Wikipedia articles. One of the main difficulty in u...

متن کامل

A Class of compact operators on homogeneous spaces

Let $varpi$ be a representation of the homogeneous space $G/H$, where $G$ be a locally compact group and $H$ be a compact subgroup of $G$. For an admissible wavelet $zeta$ for $varpi$ and $psi in L^p(G/H), 1leq p <infty$, we determine a class of bounded compact operators which are related to continuous wavelet transforms on homogeneous spaces and they are called localization operators.

متن کامل

Short Text Classification Improved by Learning Multi-Granularity Topics

Understanding the rapidly growing short text is very important. Short text is different from traditional documents in its shortness and sparsity, which hinders the application of conventional machine learning and text mining algorithms. Two major approaches have been exploited to enrich the representation of short text. One is to fetch contextual information of a short text to directly add more...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

An I-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents

نویسندگان

چکیده

منابع مشابه

A Joint Semantic Vector Representation Model for Text Clustering and Classification

A New Document Embedding Method for News Classification

Automatic Text Summarization Approaches to Speed up Topic Model Learning Process

A Class of compact operators on homogeneous spaces

Short Text Classification Improved by Learning Multi-Granularity Topics

عنوان ژورنال:

اشتراک گذاری